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DETAILED ACTION 

1 . The text of those sections of Title 35, U.S. Code not included in this action can be found 
in a prior Office action. 

Response to Amendment 

2. This communication is responsive to the applicant's amendment dated 05/29/2007. The 
applicant(s) amended claims 1,10-11,14, 22-24 (see the amendment: pages 2-7). 

The examiner withdraws the disclosure objection, because the applicant amended the 
corresponding content of the specification. 

The examiner withdraws the claim rejection under 35 USC 101, because the applicant 
amended the corresponding claim(s) and the applicant's arguments (see RAMARKS in the 
amendment: page 9) are persuasive. 

The examiner withdraws the claim rejection under 35 USC 112 2 , because the applicant 
amended the corresponding claim(s). 

■ 

Response to Arguments 

3. Applicant's arguments filed on 05/29/2007 with respect to the claim rejection under 35 
USC 102/103, have been fully considered but are moot in view of the new ground(s) of rejection, 
since the amended claims introduce new issue and/or change the scope of the claims. It is also 
noted that, the previous cited references are still applicable to the newly amended claims for the 
prior art rejection (see below). 
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In response to applicant's arguments (REMARKS in the amendment: page 10) regarding 
rejection under 35 USC 103 that "Erten does not teach or fairly suggest that respective 
parameters (e.g. the variance) of the Lapacian or Gausian distributions are recursively updated, 
based on the results of processing each successive frame, as is required by the present invention" 
(page 10, paragraph 3), the examiner respectfully disagrees with applicant and has a different 
view of prior art teachings and the claim interpretation. It is noted that Erten discloses the voice 
extractor 26 implementing mathematical model with parameter matrices (p42-p43) which are 
updated with time index (k or t) (p73, p86), which is reasonably read on the claimed "recursively 
updated". Erten also discloses that voice detector using windowed signals (successive frames) 
and properties (parameters) including power, magnitudes, phase and statistical properties (pi 07- 
pl08), and teaches that 'voice signals tend to have Laplacian probability distribution 9 , 'noise 
signals... tend to have a Gaussian or Super-Gaussian probability distribution', and 'the variance 
of extracted speech signal 28 or speech frequency bands 160' and 'various other statistical 
measures. . . may be extracted as properties of speech and noise signals or frequency bands' 
(pi 10), which means these properties (parameters) are updated in each windowed signal (frame) 
basis (reads on recursively updated). Therefore, Erten discloses the argued limitation, based on 
the broadest reasonable interpretation of the claim (such as claim 1). 

For above reason, the applicant's argument is not persuasive. 

Claim Rejections - 35 USC § 103 
4. Claims 1-4, 6, 10-11, 16-18, 22, and 24-25 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over ERTEN (US 2002/01 16187 Al). 
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As per claim 1, as best understood in view of the rejection under 35 USC 101 (see 
above), ERTEN discloses 'speech detection' (title), comprising: 

"decomposing a frame of the noise-contaminated signal received in a predefined time 
period into decorrelated signal components" (Fig.8, and paragraph (hereinafter referenced as 

» 

p)106, 'time window (predefined time period)'; pl07, 'frequency converter 158 generates 
(decomposes) speech frequency bands ...from windowed speech signal (frame) 152', 'implement 
a fast Fourier transform (FFT) algorithm', wherein Fourier transform inherently decomposes the 
windowed signal (frame) into uncorrected signal components; Fig. 5, shows separation of 
speech 60 and noise 30, which can also be read on the claim); and for each component: 

"recursively updating respective parameters characterizing a Gaussian noise 
distribution and a signal distribution of each of the respective components as a function 
of time", (p42, 'parameter matrices' and 'continuous-time dynamics or discrete-time 
state' (function of time); p49, 'mixing environment can be modeled as the following 
nonlinear discrete-time dynamic processing model (function of time)'; p53, 'the update 
law for dynamic environments (corresponding to recursively updating) is used to recover 
the original signals' and 'environment 42 is modeled as linear dynamical system'; pi 10, 
'voice signals tend to have Laplacian probability distribution' and 'noise signals... tend to 
have a Gaussian or Super-Gaussian probability distribution'; pi 03, 'properties (also 
corresponding to parameters) can convey any information' including 'power, statistical 

* 

properties, spectral properties, envelop properties, proximity...'; also see p43, p73, p86, 
P 107-pl08); 
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"using the respective parameters to evaluate a [composite Gaussian and signal] 
distribution function to provide estimate of noise and signal contributions to the 
component" (pi 10, 'the variance (estimate)... may be used to determine (evaluate) the 
presence of voice (corresponding to signal contributions)' and 'various other statistical 
measures, such as kurtosis, standard deviation ...may be extracted as properties 
(estimates) of speech and noise signals or frequency bands (components)'; Figs. 2-5, 
c mixed environment' (corresponding to a distribution function)); and 

"attenuating the component in proportion to the estimated noise contribution to 
the component" (pl05, 'attenuator 142 attenuates extracted speech signals 28 based on 

9 

detection parameter 140; pi 13, 'speech detected signal 212 has such noisy periods 
attenuated' and Figs. 14-15 also shows signal 212 is attenuated in proportion; pi 08, 

'frequency band output 168 may include speech frequency band 160 scaled by the ratio 

» 

of speech in-band power to noise in-band power (implying the claimed limitation)'). 

But, ERTEN does not expressly disclose the distribution function being "composite 
Gaussian and signal distribution function". However, as stated above, ERTEN teaches that 
'voice signals tend to have Laplacian probability distribution' and 'noise signals... tend to have a 
Gaussian or Super-Gaussian probability distribution' (pi 10), and processing the mixed signal in 
'mixed environment' (Figs. 2-5). Therefore, it would have been obvious to one of ordinary skill 
in the art at the time the invention was made to recognize that the mixed signal would have a 
mixed (joint or composite) distribution that corresponds to the mixed environment, and to 

■ 

combine the teachings of ERTEN by providing a mixed (joint or composite) distribution that 
reflects the mixed signals with properties of Laplacian probability distribution (for speech) and 
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Gaussian probability distribution (for noise) in the mixed environment, because either of speech 
and noise has its own probability distribution as suggested by ERTEN and the mixed signal is 
necessarily associated with a mixed (joint or composite) distribution to reflect properties of the 
mixed signal and noise distribution in the mixed* environment. 

As per claim 2 (depending on claim 1), the rejection is based on the same reason 
described for claim 1, because the rejection for claim 1 covers the same or similar limitation(s) 
as claim 2. 

As per claim 3 (depending on claim 1), the rejection is based on the same reason 
described for claim 1, because the rejection for claim 1 covers the same or similar limitation(s) 
as claim 3, wherein 'time window' and 'windowed speech signals' inherently include the 
claimed "a predefined number of samples" and FFT also inherently includes the claimed 
"applying a matrix transform". 

As per claim 4 (depending on claim 1), the rejection is based on the same reason 
described for claim 1, because the rejection for claim 1 covers the same or similar limitation(s) 
as claim 4, wherein TFT 5 inherently includes the claimed "mapping... from a time domain to a 
frequency domain". 

As per claim 6 (depending on claim 1), the rejection is based on the same reason 
described for claim 1, because the rejection for claim 1 covers the same or similar limitation(s) 
as claim 3, wherein 6 Fourier transform' inherently includes the sinusoidal functions as basis 
functions as claimed. 

As per claim 10 (depending on claim 2), ERTEN discloses "using a value computed 
during processing of a previous frame were processed to determine which of the parameters 
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characterizing the respective distribution to update" (Fig. 7 and pi 04-105, 'detection parameter 
. . .may be scaled. . .or . . .a binary value', which is used to 'attenuates (update) extracted speech 
signal 5 ; also see Fig. 8 and pi 08). 

As per claim 11 (depending on claim 10), ERTEN does not expressly disclose "wherein 
the value computed during processing of a previous frame is an a priori probability that the frame 
constitutes noise, and using the a priori probability to select which of the parameters to update 
comprises: selecting a measure of variance that characterizes the Gaussian noise distribution if 
the a priori probability is below a predetermined threshold; and otherwise selecting a measure of 
variance factor that characterizes the Laplacian distribution." However, ERTEN teaches that 
using 'probability density of the Jth component (interpreted as a priori probability of 
components, including noise)' (p47); 'speech likelihood signal may be a binary signal or may 
expressed some probability that speech has been detected' (pi 14); 'a binary value resulting from 
comparing the operation results to one or more threshold values' (pi 04); 'voice signals tend to 
have Laplacian probability distribution... noise signals... tend to have a Gaussian or Super- 
Gaussian probability distribution. . .thus voice signals can be said to be of low variance', 'the 
variance of extracted speech signal or speech frequency bands may be used to determine the 
presence of voice' and 'various other statistical measures... my be extracted as properties of 
speech and noise signal or frequency bands' (pi 10). Therefore, it would have been obvious to 
one of ordinary skill in the art at the time the invention was made to recognize that the likelihood 
signal expressed by probability can be an a priori probability and is associated with Laplacian 
(for speech) and/or Gaussian (for noise) probability distribution using the corresponding 
variance, and to combine the teachings of ERTEN by providing (a priori) probability and the 
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associated Laplacian (for speech) or Gaussian (for noise) probability distributions using variance, 
as suggested by ERTEN, for the purpose (motivation) of using various statistical measures for 
extracting properties of speech and noise and/or produce separated speech and noise signals from 
mixed a signal (ERTEN: pi 10 and p39). 

As per claim 16 (depending on claim 1 1), as state above, ERTEN discloses "computing a 
measure of fit of the components to a composite Gaussian and Laplacian distribution" (as 
describe for claim 1 ; also see ERTEN: pi 03 and pi 10). 

As per claim 17 (depending on claim 16), ERTEN further discloses "computing a 
measure of fit of each of the received components to a respective Gaussian noise distribution 
defined using the respective parameters; and comparing a mean of the measures of fit to the 
respective Gaussian noise distributions with a mean of the measures of fit to the composite 
Gaussian and Laplacian distributions, to compute a likelihood that the components of the frame 
constitute noise or noise-contaminated voice signal", (ERTEN: 103, 'properties (measures or 
parameters)... may include... statistical properties (necessarily including mean value), 
. . .averages (broadly interpreted as mean values). . .model fitting values (including measure of 
fit)'; pi 10, 'various other statistical measures'; Fig. 5 and p90, 'generates (comparing 
result). . .the difference between sound signal (corresponding to the composite Gaussian and 
Laplacian distributions) from microphone m2 and filtered noise signal (corresponding to 
Gaussian noise distributions)'; pi 13-pl 14, 'speech detected signal has such noise periods 
attenuated' (detecting noise) and 'speech likelihood signal may be a binary signal (implying 
either speech with noise or background noise only); which corresponds to the claim). 
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As per claim 18 (depending on claim 17), ERTEN discloses "evaluating the distribution 
at the value of the component received" (with same reason described above; also see ERTEN: 
pi 10), 

As per claim 22 (depending on claim 1), ERTEN does not expressly disclose "computing 
at least an approximation to an expected value of the composite Gaussian and signal distribution 
using a respective value of each component, and the parameters, to obtain a corresponding 
signal-enhanced component, if it is determined that the frame is signal active". However, 
ERTEN teaches generating 'one or more noise signal properties' including 'statistical 
properties. . .average (approximation to an expected value). . .model fit values (can also includes 
approximation to an expected value)' (pi 03), using 'Gaussian' and 'Laplacian probability 
distributions' with 'various statistical measures (including approximation to an expected value, 
such as the corresponding estimated sample value)' to 'determine the presence of voice' (pi 10); 
'speech likelihood signal' and 'speech detector' (pi 14); and extracting 'noise signal' and 
producing 'detected speech signal (obtain a signal-enhanced component' (Fig. 5 and p90; and 
Fig. 7 and 105). Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to recognize that a temporal (or ergodic) value of a test samples can 
be used as an approximation of statistical expected (ensemble) value, such as a time average can 
be an approximation of a mean (statistical expected) value, and to combine the different 
teachings of ERTEN by providing an approximation to an expected value with Laplacian and 
Gaussian (for noise) probability distributions, such as time average, suggested by ERTEN, for 
the purpose (motivation) of extracting properties of speech and noise and/or producing separated 
speech and noise signals from mixed a signal (ERTEN: pi 10 and p39). 
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As per claim 24, it recites an apparatus. The rejection is based on the same reason 
described for claims 1 and 22, because the rejection for claims 1 and 22 covers the same or 
similar limitation(s) as claim 24 (wherein 'speech likelihood signal' and 'speech detector' (pi 14) 
is read on "voice activity detector" with the associated functionality as claimed), except the 
limitation "an inverse signal transform for re-composing the frame of samples". However, this 
feature is further disclosed by ERTEN (p40, 'transform function inversion'; Fig. 8 and pi 09, 
'combiner 170 performs... by an inverse-FFT to generate detected speech signal 34'). 

As per claim 25 (depending on claim 24), ERTEN discloses "the clean speech estimator 
computes an expected value of each of the composite Gaussian and Laplacian distributions to 
independently derive a speech-enhanced component corresponding to each of the components" 
(pi 10, 'the variance (expected value) of extracted speech signal 28 or speech frequency bands 
(components) may be used to determine (evaluate) the presence of voice (corresponding to signal 
contributions)' and 'various other statistical measures, such as kurtosis, standard deviation (also 
expected values) . . .may be extracted as properties of speech and noise signals or frequency 
bands (components)'; Fig. 8 and pl08-pl09, 'any property of speech frequency band o noise 
frequency band may be used' including 'statistical properties'; 'combiner 170 combines 
frequency band output (a speech-enhanced component) 168 for each speech frequency band 160 
to generate detected speech signal'). 

5. Claims 5, 7-9 and 26 are rejected under 35 U.S.C. 103(a) as being unpatentable over 

* 

ERTEN in view of admitted prior art disclosure, hereinafter referenced as ADMISSION. 
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As per claim 5 (depending on claim 4), ERTEN does not expressly disclose "mapping 
the frame comprises applying a discrete cosine transform to the frame of samples". However, 
the feature is well known in the art as evidenced by ADMISSION who teaches that 'there are 
many known transforms for decomposing (mapping) a frame of samples' and 'the most common 
of these include the frequency-domain transforms such as the Fourier transform, and the discrete 

m 

cosine transform (DCT), wavelet decomposition transforms such as the standard wavelet 
transform (SWT), and adaptive transforms like the Karhunen-Loeve Transform' (p5-p6 in the 
section of "Background of the Invention" of the specification). Therefore, it would have been 
obvious to one of ordinary skill in the art at the time the invention was made to modify ERTEN 
by providing a transform using DCT for the decomposition, as taught by ADMISSION, for the 
purpose (motivation) of providing low complexity decomposition technique (ADMISSION: p6). 

As per claim 7 (depending on claim 6), ERTEN does not expressly disclose decomposing 
the frame into "wavelets". However, the feature is well known in the art as evidenced by 
ADMISSION who teaches that 'there are many known transforms for decomposing (mapping) a 
frame of samples' and 'the most common of these include the frequency-domain transforms such 
as the Fourier transform, and the discrete cosine transform (DCT), wavelet decomposition 
transforms such as the standard wavelet transform (SWT), and adaptive transforms like the 
Karhunen-Loeve Transform' (see p5 and p7 in the section of "Background of the Invention" of 
the specification). Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify ERTEN by providing a transform using DCT for the 
decomposition, as taught by ADMISSION, for the purpose (motivation) of better representing 
discontinuities for the signal (ADMISSION: p7). 
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As per claims 8-9 (depending on claim 6), ERTEN does not expressly disclose 
"recomputing the basis functions to adaptively optimize decomposition" and "applying an 
adaptive Karhunen-Loeve transform". However, the feature is well known in the art as evidenced 
by ADMISSION who teaches that 'there are many known transforms for decomposing 
(mapping) a frame of samples' and 'the most common of these include the frequency-domain 
transforms such as the Fourier transform, and the discrete cosine transform (DCT), wavelet 
decomposition transforms such as the standard wavelet transform (SWT), and adaptive 
transforms like the Karhunen-Loeve Transform' (p5 and p7 in the section of "Background of the 
Invention" of the specification). Therefore, it would have been obvious to one of ordinary skill 
in the art at the time the invention was made to modify ERTEN by providing a transform using 
DCT for the decomposition, as taught by ADMISSION, for the purpose (motivation) of 
maximizing the capacity of the basis functions to present the signal (ADMISSION: p7). 

As per claim 26 (depending on claim 25), the rejection is based on the same reason 
described for claim 5, because the claim recites the same or similar limitation(s) as claim 5. 

6. Claims 12-13 are rejected under 35 U.S.C. 103(a) as being unpatentable over ERTEN in 
view of VALVE et al. (US 6,707,910 bl), hereinafter referenced as VALVE. 

As per claim 12 (depending on claim 1 1), ERTEN does not expressly disclose "the a 
priori probability is defined by evaluating a hidden state of a hidden Markov model". However, 
the feature is well known in the art as evidenced by VALVE who discloses 'detection of the 
speech activity of a source '(title), comprising using 'HMMs (hidden Markov models — statistical 
models)' having 'probability density function (pdf: corresponding to a priori probability)' for 
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'speech activity detection' (col. 9, lines 22-49). Therefore, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to modify ERTEN by providing 
HMMs with pdfs for speech activity detection, as taught by VALVE, for the purpose 
(motivation) of improving speech activity detection by utilizing statistical information (VALVE: 
col. 9, lines 9-13). 

As per claim 13 (depending on claim 12), ERTEN in view of VALVE discloses 
"incrementally changing the parameter in accordance with a difference between an expected 
value of the component given the past value of the parameter, and the value of the component 
received" (ERTEN: p53, 'the update law for (dynamic incrementally changing) environments is 
used to recover the original signals' and 'environment 42 is modeled as linear dynamical 
system'; pi 10, 'statistical measures (parameters)', such as 'variance', 'kurtosis' and 'standard' 

■ 

• can be interpreted as expected value; wherein HMMs inherently include determining 
difference(s) (state changing) between current parameter(s) and the past value of the 
parameter(s), as claimed). 

A llo wable Subject Matter 
7. Claims 14-15, 19-21 and 23 are objected to as being dependent upon a rejected base 
claim, but would be allowable if rewritten in independent form including all of the limitations of 
the base claim and any intervening claims. 

The following is a statement of reasons for the indication of allowable subject matter: 
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Regarding claim 14, the instant application is directed to a method for discriminating 
noise from signal in noise-contained signal. The depend claim, combining all well known 
features in its parent claim(s), further identifies the uniquely distinct features of: 

wherein recursively updating a parameter further comprises incrementally changing the 
parameter in accordance with a difference between an expected value of the component given the 
past value of the parameter, and the value of the component received (from it parent claim 13); 
and 

wherein incrementally changing comprises applying a first order smoothing filter to the 
components (claim 14). 

Regarding claim 19, the instant application is directed to a method for discriminating 
noise from signal in noise-contained signal. The depend claim, combining all well known 
features in its parent claim(s), further identifies the uniquely distinct features of: 

wherein comparing a mean of the measures of fit comprises dividing a product of the 
measures of fit of the components to the composite Gaussian and Laplacian distribution by a 
product of the measures of fit of the components to the noise distribution. 

Regarding claim 23, the instant application is directed to a method for discriminating 
noise from signal in noise-contained signal. The depend claim, combining all well known 
features in its parent claim(s), further identifies the uniquely distinct features of: 

wherein computing at least an approximation comprises computing a piece-wise linear 
function approximation of the expected value as a function of the parameters and the component. 



Application/Control Number: 10/620,453 Page 15 

Art Unit: 2626 

Regarding claims 15 and 20-21, the statement for the allowable subject matter is based 
on the same reason described for claim 14 and 19 (see above), because these dependent claims 
inherit all laminations of their parent claims 14 and 19 receptively. 

8. The prior art of record, ERTEN (US 2002/01 16187 Al), VALVE et al. (US 6,707,910 . 
bl), and ADMISSION, provided numerous teachings and techniques of speech detection, 
including extracting speech signal/noise signal from mixed input signal, estimating the signal 
characteristics, extracting features/parameters using spectral properties and statistical properties, 
providing feed forward and feedback voice extractor models, updating the matrix parameters of 

» 

the models, using widowed signal and FFT transform, obtaining variance and various other 
statistical measure based on Lapacian probability distribution of voice signal and Gaussian 
probability distribution of noise, attenuating/scaling the extracted signal; applying know 
transform for decomposing signal including DCT, wavelet and Karhunen-Loeve Transform; 
using HMM with probability density function (a priori probability) for speech activity detection. 
However, the combined features stated above, are not anticipated by, nor made obvious over the 
prior art of the record. 

Conclusion 

9. Applicant's amendment necessitated the new ground(s) of rejection presented in this 
Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). 
Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A 
shortened statutory period for reply to this final action is set to expire THREE MONTHS from 
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the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the 
mailing date of this final action and the advisory action is not mailed until after the end of the 
THREE-MONTH shortened statutory period, then the shortened statutory period will expire on 
the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be 
calculated from the mailing date of the advisory action. In no event, however, will the statutory 
period for reply expire later than SIX MONTHS from the date of this final action. 

10. Please address mail to be delivered by the United States Postal Service (USPS) as 
follows: 

Mail Stop 

Commissioner for Patents 

P.O. Box 1450 

Alexandria, VA 22313-1450 
or faxed to: 571-273-8300, (for formal communications intended for entry) 
Or: 571-273-8300, (for informal or draft communications, and please label 
"PROPOSED" or "DRAFT") 

If no Mail Stop is indicated below, the line beginning Mail Stop should be omitted from 
the address. 

Effective January 14, 2005, except correspondence for Maintenance Fee payments, 
Deposit Account Replenishments (see 1.25(c)(4)), and Licensing and Review (see 37 CFR 5.1(c) 
and 5.2(c)), please address correspondence to be delivered by other delivery services (Federal 
Express (Fed Ex), UPS, DHL, Laser, Action, Purolater, etc.) as follows: 

U.S. Patent and Trademark Office 

Customer Window, Mail Stop 

Randolph Building 

Alexandria, V A 223 14 
Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Qi Han whose telephone numbers is (571) 272-7604. The 
examiner can normally be reached on Monday through Thursday from 9:00 a.m. to 7:30 p.m. If 
attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, 
Richemond Dorvil, can be reached on (571) 272-7602. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Inquiries regarding the status of submissions 
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